Filtration of String Proximity Search via Transformation
نویسندگان
چکیده
The problem of proximity search in biological databases is addressed. We study vectortransformations and conduct the application of DFT(Discrete Fourier Transformation) andDWT(Discrete Wavelet Transformation, Haar) dimensionality reduction techniques for DNAsequence proximity search to reduce the search time of range queries. Our empirical results on anumber of Prokaryote and Eukaryote DNA contig databases demonstrate up to 50-fold filtrationratio of the search space, up to 13 times faster filtration. The proposed transformation techniquesmay easily be integrated as a preprocessing phase on top of the current existing similarity searchheuristics such as BLAST[1], PattenHunter[3], FastA[4], QUASAR[2] and to efficiently prunenon-relevant sequences. We study the precision of applying dimensionality reduction techniquesfor faster and more efficient range query searches, and discuss the imposed trade-offs. References[1] S. Altschul, W. Gish, W. Miller, E. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol.,215:403–410, 1990.[2] S. Burkhardt, A. Crauser, P. Ferragina, H.P. Lenhof, E. Rivals, and M. Vingron. q-gram based database searchingusing a suffix array (quasar). In RECOMB, pages 77–83, 1999.[3] B. Ma, J. Tromp, and M. Li. Patternhunter: faster and more sensitive homology search. Bioinformatics, 18(3):440–445, March 2002.[4] W. R. Pearson. Using the fasta program to search protein and dna sequence databases. Methods Mol Biol,25:365–389, 1994.
منابع مشابه
Using Transformation Techniques Towards Efficient Filtration of String Proximity Search of Biological Sequences
The problem of proximity search in biological databases is addressed. We study vector transformations and conduct the application of DFT(Discrete Fourier Transformation) and DWT(Discrete Wavelet Transformation, Haar) dimensionality reduction techniques for DNA sequence proximity search to reduce the search time of range queries. Our empirical results on a number of Prokaryote and Eukaryote DNA ...
متن کاملBFT: A Relational-based Bit Filtration Technique for Efficient Approximate String Joins in Biological Databases
Joining massive tables in relational databases have received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of pairwise whole genome comparison into an approximate join operation in the wellestablished relational database context. We propose a ...
متن کاملBFT: Bit Filtration Technique for Approximate String Join in Biological Databases
Joining massive tables in relational databases have received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of pairwise whole-genome comparison into an approximate join operation in the wellestablished relational database context. We propose a ...
متن کاملApplication of grey GIS filtration to identify the potential area for cement plants in South Khorasan Province, Eastern Iran
Cement-based materials are fundamental resources used to in construction. The increase in requests for and consumption of cement products, especially in Iran, indicates that more cement plants should be equipped. This study developed a geographical information system using pairwise comparison based on grey numbers to identify potential sites in which to set up cement plants. A group of five exp...
متن کاملDistance Based Indexing for String Proximity Search
In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures a...
متن کامل